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Abstract. We review the recent approach of correlation based networks of financial equities. We investigate 
portfolio of stocks at different time horizons, financial indices and volatility time series and we show 
that meaningful economic information can be extracted from noise dressed correlation matrices. We show 
that the method can be used to falsify widespread market models by directly comparing the topological 
properties of networks of real and artificial markets. 
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1 Introduction 

The study of topological properties of networks has re- 
1 cently received a lot of attention. In particular, it has 

■ been shown that many natural and social systems dis- 
. play unexpected statistical properties of links connecting 

■ different elements of the system [212 and cannot therefore 
' be described in terms of random graphs P] . The topolog- 
ical properties of several graphs describing physical and 
social systems have been recently investigated. Examples 

, are the World Wide Web 0], Internet and social 

■ networks jj]. In the networks investigated in these papers 
[ (and in many others) the links represent relation between 
I nodes which are either present or absent in a given in- 

■ stant of time. By contrast we have recently started the 
, investigation of correlation based networks, i.e. networks 

used to visualize the structure of pair cross correlations 
among a set of time series. From a set of n time series one 
. can extract the correlation coefficient between any pair of 
' variables. If we identify the different time series with the 
, nodes of the network, each pair of nodes can be thought 
to be connected by an arc with a weight related to the 
correlation coefficient between the two time series. The 
network is therefore completely connected. By introduc- 
ing a suitable filtration of the network one can remove the 
less relevant information by removing the weakest links. 
In fact, it is known that the finiteness of time series can 
introduce spurious correlation. In principle there are many 
different ways of filtering the correlation matrix in order to 
obtain noise filtered information. In this context we have 
focused mainly on financial markets [11121 and on a partic- 
ular type of network that can be obtained form the cor- 
relation matrix, specifically the minimum spanning tree. 



Spanning trees are particular types of graphs that connect 
all the vertices in a graph without forming any loop. 

The presence of a high degree of cross-correlation be- 
tween the synchronous time evolution of a set of equity 
returns is a well known empirical fact observed in finan- 
cial markets |1()II11II 12' . For a time horizon of one trading 
day correlation coefficient as high as 0.7 can be observed 
for some pair of equity returns belonging to the same eco- 
nomic sector. 

The study of cross-correlation of a set of financial equi- 
ties has also practical importance since it can improve the 
ability to model composed financial entities such as, for 
example, stock portfolios. There are different approaches 
to address this problem. The most common one is the 
principal component analysis of the correlation matrix of 
the data ^Hl- Recently an investigation of the properties 
of the correlation matrix has been performed by physi- 
cists by using the perspective and theoretical results of 
the random matrix theory |14lll5j . As mentioned above, 
another approach is the correlation based clustering anal- 
ysis which allows to obtain clusters of stocks starting from 
the time series of price returns. Different algorithms exist 
to perform cluster analysis in finance ^8.16. 17. 18-19,.2'n| . 

In previous work, some of us have shown that a spe- 
cific correlation based clustering method gives a meaning- 
ful taxonomy for stock return time series ,8..21.22, . for 
market index returns of worldwide stock exchanges [2S1 
and for volatility increments of stock return time series 
|24| . Here we review the results obtained in these previ- 
ous studies and discuss them from a unified perspective. 
Specifically, Sect. 2 discusses the correlation based cluster- 
ing method, Sect. 3 focuses on the properties of networks 
detected in a portfolio of stocks when stock returns are 
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sampled at different time horizons. Sect. 4 discusses the 
properties of networks observed by investigating stock in- 
dices of stock exchanges located all over the world and 
Sect. 5 discusses the case of financial networks obtained 
starting from volatility time series. Sect. 6 is about the 
comparison of topological properties of real data with the 
ones of simple and widespread models of market activity. 
Finally, in Sect. 7 we draw our conclusions. 



2 A financial network obtained by a 
correlation-based filtering procedure 

In Ref. |H], it has been proposed a correlation based method 
able to detect economic information present in a corre- 
lation coefficient matrix. This method is a filtering pro- 
cedure based on the estimation of the subdominant ul- 
trametric |25| associated with a metric distance obtained 
form the correlation coefficient matrix of set of n stocks. 
This procedure, already used in other fields, allows to ob- 
tain a metric distance and to extract from it a minimum 
spanning tree (MST) and a hierarchical tree from a corre- 
lation coefficient matrix by means of a well defined algo- 
rithm known as nearest neighbor single linkage clustering 
algorithm . This allows to reveal geometrical (through- 
out the MST) and taxonomic (throughout the hierarchical 
tree) aspects of the correlation present among stocks. 

The network is obtained by filtering the relevant in- 
formation present in the correlation coefficient matrix of 
the original time scries of stock returns. This is done (i) 
by determining the synchronous correlation coefficient of 
the difference of logarithm of stock price computed at a 
selected time horizon, (ii) by calculating a metric distance 
between all the pair of stocks and (iii) by selecting the sub- 
dominant ultrametric distance associated to the consid- 
ered metric distance. The subdominant ultrametric is the 
ultrametric structure closest to the original metric struc- 
ture IS]. 

The correlation coefficient is defined as 



^{{rl)-{nm{r])-{r,)^) 

where i and j are numerical labels of the stocks, = 
In Pi (t) — In Pi {t — At) , Pi {t) is the value of the stock price 
i at the trading time t and At is the time horizon which 
is, in the present Section, one trading day. The correlation 
coefficient for logarithm price differences (which almost 
coincides with stock returns) is computed between all the 
possible pairs of stocks present in the considered portfolio. 
The empirical statistical average, indicated in this paper 
with the symbol (.), is here a temporal average always 
performed over the investigated time period. 

By definition, pij{At) can vary from -1 (completely 
anti-correlated pair of stocks) to 1 (completely correlated 
pair of stocks). When pij{At) = the two stocks are un- 
correlated. The matrix of correlation coefficient is a sym- 
metric matrix with pal At) = 1 in the main diagonal. 



Hence for each value of At, n{n — l)/2 correlation co- 
efficients characterize each correlation coefficient matrix 
completely. 

A metric distance between pair of stocks can be rigor- 
ously determined by defining 

d,^,(At) = ^2{1- p,,(At)). (2) 

With this choice dij {At) fulfills the three axioms of a met- 
ric - (i) di_j{At) = if and only if i = j; (ii) di^j{At) = 
dj^i{At) and (in) di^j{At) < di^k{At) + dkjiAt). The dis- 
tance matrix D(Zit) is then used to determine the MST 
connecting the n stocks. 

The MST, a theoretical concept of graph theory is 
the spanning tree of shortest length. A spanning tree is a 
graph without loops connecting all the n nodes with n — 1 
links. We have seen that the original fully connected graph 
is metric with distance dij which is decreasing with pij. 
Therefore the MST selects the n — 1 stronger (i.e. shorter) 
links which span all the nodes. The MST allows to obtain, 
in a direct and essentially unique way, the subdominant 
ultrametric distance matrix T>^(At) and the hierarchical 
organization of the elements (stocks in our case) of the 
investigated data set. 

The subdominant ultrametric distance between objects 
i and j, i.e. the element df^ of the D<(Z\i) matrix, is the 
maximum value of the metric distance dk^ detected by 
moving in single steps from i to j through the path con- 
necting i and j in the MST. The method of constructing a 
MST linking a set of n objects is direct and it is known in 
multivariate analysis as the nearest neighbor single link- 
age cluster analysis [5^]. A pedagogical exposition of the 
determination of the MST in the contest of financial time 
series is provided in ref. [221 ■ Subdominant ultrametric 
space 1251 has been fruitfully used in the description of 
frustrated complex systems. The archetype of this kind of 
systems is a spin glass [50) . 

As an example of the results obtained with this method 
here we briefly discuss the results obtained in ref. 21 , by 
investigating a set of 100 highly capitalized stocks traded 
in the major US equity markets during the period January 
1995 - December 1998. At that time, most of them were 
used to compute the Standard and Poor's 100 index. The 
prices are transaction prices stored in the Trade and Quote 
database of the New York Stock Exchange. 

The time horizons investigated in the cited study varies 
from At = d = 6 h and 30 min (a trading day time inter- 
val), to At — d/20 = 19 min and 30 sec. 

In Fig. n we show the minimal spanning tree obtained 
in this investigation with a time horizon equal to one 
trading day. Stocks are identified with their tick symbols. 
Information about the company indicated by each tick 
symbol can be easily find in several financial web pages 
such as, for example, http://www.quicken.com . Cluster 
of stocks which are homogeneous with respect to the eco- 
nomic sectors of firms are clearly observed. Prominent ex- 
amples of clusters are the ones of (i) oil companies which 
is, to be precise, a cluster composed by two separated sub- 
clusters, one including the companies SLB, HAL, BHI, 



Bonanno et at: Networks of equities in financial markets 



3 




Fig. 1. Minimum spanning tree of 100 highly capitalized stocks 
traded in the US equity markets. The filtering procedure has 
been obtained by considering the correlation coefficient of stock 
returns time series computed at a 1 trading day time horizon 
(6 h and 30 min). Each circle represents a stock labeled by 
its tick symbol. The minimum spanning tree presents a large 
amount of stocks having a single link and some stocks having 
several links. Some of these stocks act as a "hub" of a local 
cluster. Examples are INTC and CSCO for technology stocks, 
AIG, BAG and MER for financial stocks and AEP for utilities 
stocks. The stock GE (General Electric Go.) links a relatively 
large number of stocks belonging to various sectors. 

CGP and WMB (companies which are providing financial 
services to the oil industry and companies of the gas in- 
dustry) and the other one including MOB, CHV, XON, 
ARC, OXY (companies of the oil industry); (ii) finan- 
cial (JPM, BAG, MER, USB, ONE, WFC, APX, etc) 
and consumer/non-cyclical companies (KO, GE, PG, GL, 
AVP, JNJ, etc); (iii) technology companies (MSFT, INTO, 
TXN, CSGO, NSM, IBM, HWP, ORGL); (iv) basic mate- 
rials companies (AA, WY, GHA, IP, BGG), and (v) utility 
companies (BEL, AIT, GTE, SO, AEP, UGM, ETR). 

Equity time series are then carrying economic informa- 
tion which can be detected by using specialized filtering 
procedures. Therefore, price time series in a financial mar- 
ket reflect information about the economic sector of activ- 
ity of the company. This information is usually dressed by 
the noise due to statistical fluctuations. Filtering proce- 
dures, like the one we are proposing, are able to undress 
the signals from the noise and reveal the more relevant 
information. 

3 Minimal spanning trees of stock portfolios 
at different time horizons 

In this section we discuss how the correlation structure of 
a portfolio of stocks changes when the time horizon used 
to compute the correlation coefficient is progressively de- 
creased to an intraday time scale. It is known since 1979 
that the degree of cross-correlation diminishes by dimin- 



ishing the time horizon used to compute stock returns 
j31|. This phenomenon is sometime addressed as "Epps 
effect" . The existence of this phenomenon motivates us to 
investigate the nature and the properties of the network 
associated to a given financial portfolio as a function of 
the time horizon used to record stock return time series. 

In Ref. , some of us used the high-frequency data of 
the transactions occurring in the US equity markets which 
are recorded in the Trade and Quote database of the New 
York Stock Exchange. By using this database we are able 
to investigate comovements of a set of highly capitalized 
stocks for daily and intra daily time horizons. 

A clear modification of the hierarchical organization of 
the set of stocks investigated is detected when one changes 
the time horizon used to determine stock returns. The 
structure of the considered set of 100 US stocks changes 
its nature moving from a complex organization to a pro- 
gressively elementary one when the time horizon of price 
changes varies from d = 23400 s to (i/20, where d is the 
daily time horizon at the New York Stock Exchange. The 
amount of information processed consists of about 100 
millions of transactions. The time horizons investigated 
are At = c? = 6 h and 30 min (a trading day time inter- 
val), Z\< = d/2 = 3 h and 15 min, Z\i = d/5 = 1 h and 18 
min, At = d/lO = 39 min and At = d/20 = 19 min and 
30 sec. The shortest time horizon was chosen in order to 
statistically ensure that for each stock at least 1 transac- 
tion occurs during the time horizon At. The daily mean 
number of transactions for the 100 selected stocks is rang- 
ing from 11944.3 transactions of Intel Gorp. (INTO) to 
the 121.48 transactions of Mallinckrodt Inc. New (MKG). 

The 'Epps effect' predicts that the intra-sector pair 
correlation decreases by decreasing the time horizon At. 
In Ref. [2], authors show that the mean correlation co- 
efficient (p) obtained by averaging over the n{n — l)/2 
off-diagonal elements of the correlation coefficient matrix 
is decreasing when At decreases. The most prominent cor- 
relation weakening is observed for the most correlated pair 
of stocks (the ones having a correlation coefficient closes 
to the maximum value Pmax)- In fact, pmax decreases from 
0.76 to 0.52 when At changes from 6 h and 30 min to 19 
min and 30 s. 

The decrease of the correlation between pairs of the 
correlation based network of stocks affects the nature of 
the hierarchical organization of stocks. The clusters ob- 
served in Fig. n progressively disappear and the arrange- 
ment of the minimum spanning tree moves from a struc- 
tured and clustered graph to a simpler star-like graph. 
Fig. 121 shows the MSTs observed at different time hori- 
zons ranging from d/20 to ci/2. The change of structure 
of the MST is indeed dramatic if one considers the role of 
some highly connected stock such as, in the present case, 
GE. This stock has a degree, i.e. a coordination number, 
equals to 20 when At = d/2 = 3 h and 15 min whereas this 
number grows up to 61 when the time horizon is decreased 
to At = d/20 = 19 min and 30 s. 

It is worth pointing out that the change in the struc- 
ture of the MST and hierarchical tree is not just a simple 
consequence of the 'Epps effect'. In fact, the changes ob- 
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Fig. 2. Minimum spanning tree of 100 highly capitalized stocks traded in the US equity markets. The time scale At used to 
compute the correlation coefficients between stocks is smaller than one trading day. Specifically we show the MST obtained 
with At = d/2Q (top left), At = d/10 (top right), At = d/5 (bottom left), and At — d/2 (bottom right), where d is one trading 
day. 



served in the structure of the MST suggests that the in- 
trasector correlation decreases faster than intersector cor- 
relation between pairs of stocks of the considered portfoho 
in a intra-day time scale |5J . These results show that the 
topology of a correlation based network can be affected 
by the sampling time used to monitor the time evolution 
of the system. In other words, the system presents a non 
trivial fast dynamics of stock returns realizing the com- 
plex process of the price formation occurring in a financial 
market. 



4 The network of global financial market 

A correlation based network can also be obtained by inves- 
tigating index returns of stock exchanges located around 
the world |2SI- It is worth pointing out that the study of 
the dynamics of stock exchange indices located all over the 
world presents additional difhculties with respect to the 
dynamics of a portfolio of stocks traded in a single stock 
market. To cite just two of the most prominent ones - 



(i) stock markets located all over the world have different 
opening and closing hours; and (ii) transactions in differ- 
ent markets are done by using different currencies that 
fluctuates themselves the one with respect to the other. 
It is then important to quantify the degree of similarity 
between the dynamics of stock indices of nonsynchronous 
markets trading in different currencies. 

Ref. investigates two sets of data - (i) the non- 
synchronous time evolution of n = 24 daily stock market 
indices computed in local currencies during the time pe- 
riod from January 1988 to December 1996, and (ii) the 
closure value of the 51 Morgan Stanley Capital Interna- 
tional (MSCI) country indices daily computed in local cur- 
rencies or in US dollars in the time period from January 
1996 to December 1999. The stock indices used in this 
research belong to stock markets distributed all over the 
world in five continents. Here we briefly discuss the results 
obtained with the set of Morgan Stanley Capital Interna- 
tional (MSCI) daily indices computed in local currencies. 
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Fig. 3. MST of 51 stock exchanges obtained by performing a 
correlation based clustering starting from MSCI index returns 
computed in local currencies and by using a time horizon of 
one week. 



An analysis of daily data of closure values recorded 
around the world may induce spurious correlations intro- 
duced just by the different closure times of different mar- 
kets. The effects of nonsynchronous trading in time series 
analysis are well documented in the economic literature 
[32,3 3 ,34,. In fact, different degrees of correlation between 
the New York and Tokyo markets are estimated depend- 
ing if one consider the closure - closure between the two 
markets or the closure - opening. In particular, it has been 
empirically detected that the highest degree of correlation 
between these two markets is observed between the open- 
closure return of the New York stock exchange at day t 
and the opening-closure of the Tokyo stock market at day 

Ref. [231 overcomes this intrinsic limitations by con- 
sidering a week time horizon so that the nonsynchronous 
hourly mismatch of index data is minimized. The correla- 
tion coefficient is computed between all the possible pairs 
of indices present in the database. As usual, the statis- 
tical average is a temporal average performed on all the 
trading weeks of the investigated time period. Authors ob- 
tain the n X n matrix of correlation coefficient for weekly 
logarithm index differences. The 51 indices investigated 
in Ref. [231 belong to 51 different countries. They com- 
prise the so-called emerged and emerging markets. The 
indices and their symbols can be found at the web site 
[http://www.mscidata.com The data are daily data and 
covers the period 1996-1999. In Fig. 01 we show the result 
of the analysis performed in Ref. |21] . 

The graph of Fig. O shows a clear regional clustering. 
In fact, one can easily note an European cluster linked 
to the North American stock exchanges. These last stock 
exchanges are linked to Australian and New Zealand stock 
exchanges. The clusters of South- American and Asian (with 
the exception of Japan) stock exchanges are also clearly 
recognizable. Once again, the correlation based network 
shows clusters organized with respect to an ordering prin- 



ciple, which is in this case the regional location of stock ex- 
changes. However, the topological properties of the graph 
are pretty different from the one observed for stock re- 
turns of a portfolio traded in a financial market. In fact, 
the graph is characterized by a low number of the average 
degree of elements. Moreover, differently from the case of 
the portfolio of stocks, the elements characterized by a 
relatively high coordination number do not coincides with 
the most capitalized stock exchanges. 

In summary, Ref. [221 has shown that sets of stock 
index time series located all over the world can provide 
a correlation based network that is showing a regional 
clustering but it is characterized by topological properties 
pretty different than the one observed in a portfolio of 
stocks traded in the same financial market. 



5 Networks of volatility time series 

Another investigation has been devoted to detect the net- 
work of relation which is present among volatility time 
series of stock prices traded in a financial market. Volatil- 
ity is a key financial quantity controlling the risk profile 
of a given financial asset traded in a market (12) . 

In Ref. [23 some of us investigate the statistical prop- 
erties of cross-correlation of volatility time series for the 
93 most capitalized stocks traded in US equity markets 
during a 12 year time period. Data cover the whole period 
ranging from January 1987 to April 1999 (3116 trading 
days). In the cited study daily data are considered. In par- 
ticular, authors use for the analysis the open, close, high 
and low price recorded for each trading day for each con- 
sidered stock. Starting from the daily price data, volatility 
(Ti{t) is computed by using the proxy ai(t) = 2 [max{Pi(t)} — 
mm{Pi{t)}]/[max{P,{t)} + mm{P,{t)}] where max{P,(t)} 
and mm{Pi{t)} are respectively the highest and lowest 
price of the stock i at day t. It should be noted that there is 
an essential difference between price return and volatility 
probability density functions. In fact, the probability den- 
sity function of price return is an approximately symmet- 
rical function whereas the volatility probability density 
function is significantly skewed. Bivariate variables whose 
marginals are very different from Gaussian functions can 
have linear correlation coefficients which are bounded in a 
subinterval of [—1, 1] j35j. Since the empirical probability 
density function of volatility is very different from a Gaus- 
sian, the use of a robust nonparametric correlation coeffi- 
cient is more appropriate for quantifying volatility cross- 
correlation. In fact, the volatility MSTs obtained start- 
ing from a Spearman rank-order correlation coefficient are 
more stable than the ones obtained starting from the lin- 
ear (or Pearson's) correlation coefficient [5^ . An example 
of the MST obtained starting from the volatility time se- 
ries and by using the Spearman rank-order correlation co- 
efficient is shown in Fig. 01 A direct inspection of the MST 
shows the existence of well characterized clusters. Exam- 
ples are the cluster of technology companies (HON, HWP, 
IBM, INTO, MSFT, NSM, ORCL, SUNW, TXN and UIS) 
and the cluster of energy companies (ARC, CHV, CPB, 
HAL, MOB, SLB, XON). As already observed in the MST 
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Fig. 4. Minimum spanning tree obtained by considering the 
volatility time series of 93 mostly capitalized stocks traded in 
the US equity markets in August 1998. Each stock is identi- 
fied by its tick symbol. The correspondence with the company 
name can be found in any web site of financial information. The 
volatility correlation among stocks has been evaluated by us- 
ing the Spearman rank-order correlation coefficient. The MST 
has been drawn by using the Pajek package for large network 
analysis http: / /vlado.fmf .uni-lj .si /pub / networks /pajek / 



obtained from the price return time series, the volatihty 
MST of Fig. 2] shows the existence of highly connected 
stocks. Examples are GE, JPM, and DD. The topology 
of the network is not too different from the topology of 
the network obtained from return time series sampled at 
the same time horizon (Fignj. Investigations on large sets 
of stocks would be needed to estimate if a quantifiable 
topological difference exists between return and volatility 
correlation based networks. 



6 Topology of networks in financial markets 

In the previous sections, we have discussed the shape and 
topology of several networks obtained by using a correla- 
tion based clustering procedure. In all cases, networks are 
carrying a clear economic meaning. However a difference 
in the topological properties is sometime observed when 
the set of data is ranging from stock portfolios to a set 
of stock indices or to the volatility time series of a stock 
portfolio. The topological properties are also sensitive to 
the sampling time of the time series used to compute the 
correlation coefficient matrix. It is therefore worth to in- 
vestigate more deeply the relation between the topological 
property of correlation based networks and some simple 
but widespread market models. 

In Ref. 122] some of us compare the topological prop- 
erties of the MST of empirical data recorded at the New 
York Stock Exchange with MSTs obtained from simple 
models of the portfolio dynamics. Specifically, authors con- 




Fig. 5. MST of real data from daily stock returns of 1071 
stocks for the 12-year period 1987-1998. The node symbol is 
based on the Standard Industrial Classification system. For the 
correspondence see the text. 



sider a model of uncorrelated Gaussian return time series 
and the widespread one-factor model. This last model is 
the starting point of the Capital Asset Pricing ModelfT^. 
The topological characterization of the correlation based 
MST of real data was originally investigated in Ref. 
In their study, authors investigated a portfolio of approx- 
imately 6000 stocks by estimating the correlation coeffi- 
cient on a yearly time period by using approximately 250 
daily data. Here we discuss the results obtained in the 
study of Ref. 22, ■ where authors use a smaller number 
of stocks n and a larger number of daily records T. This 
choice is motivated by the request that the correlation 
matrix be positive definite. In fact, when the number of 
variables is larger than the number of time records the 
covariance matrix is only positive semi- definite PH]. 

The data set used in Ref. [211 consists of daily closure 
prices for 1071 stocks traded at the NYSE and continu- 
ously present in the 12-year period 1987-1998 (3030 trad- 
ing days). The ratio T/N ~ 2.83 is significantly larger 
than one and the correlation matrix is positive definite. 
Fig. [S] shows the MST of the real data. The symbol code 
is chosen by using the main industry sector of each firm 
according to the Standard Industrial Classification system 
for the main industry sector of each firm and the corre- 
spondence is reported in the figure caption. Again regions 
corresponding to different sectors are clearly seen on a 
very large scale. Examples are clusters of companies be- 
longing to the financial sector (white diamonds), to the 
transportation, communications, electric gas and sanitary 
services sector (black squares) and to the mining sector 
(white circles) . The mining sector companies are observed 
to belong to two subsectors one containing oil companies 
(located on the right side of the figure) and one containing 
gold companies (left side of the figure). 

The empirical MST of real data can be compared with 
the results obtained from simple models of the simultane- 
ous dynamics of a portfolio of assets. The simplest model 
assumes that the return time series are uncorrelated Gaus- 
sian time series, i.e. ri{t) = ei(t), where ei{t) are Gaussian 
random variables with zero mean and unit variance. This 
type of model has been considered in Ref. |14II15| as a 
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Fig. 6. MST obtained by a realization of a random model of 
1071 Gaussian uncorrelated time series of length 3030. 



null hypothesis in the study of the spectral properties of 
the correlation matrix. It is well known both in the fi- 
nancial and in the econophysics literature that a random 
model does not explain the empirical observation of finan- 
cial time series. This conclusion is consistent with the ob- 
servation that topological properties of MSTs of random 
market models are pretty different from the ones obtained 
from real data. In the MST obtained with the random 
model few nodes have a degree larger than few units. In 
Fig.|Blwe show one of this MST obtained for an artificial 
market described by a random model. In Fig. El it is clear 
that the MST is composed by long files of nodes. These 
files join at nodes of connectivity equal to few units (the 
typical maximal value observed is close to 7). In other 
words, a market based on a random model has a network 
characterized by a topology essentially different from the 
one observed in real data. 

A better modeling of the dynamics of a portfolio is 
obtained by using the one-factor model. The one-factor 
model assumes that the return of assets is controlled by 
a single factor (or index). Specifically for any asset i we 
have 

n{t) = a,+|3,rM{t)+e^{t), (3) 

where {t) and tm (i) are the return of the asset i and of 
the market factor at day t respectively, Ui and (3i are two 
real parameters and ti{t) is a zero mean Gaussian noise 
term characterized by a variance equal to cr^ . . The param- 
eters of the model can be obtained from the real data by 
ordinary least square method. Our choice for the market 
factor is the Standard & Poor's 500 index. The one-factor 
model is able to reproduce quite well the distribution of 
correlation coefficient of the real data. In Fig. [T] we show 
the probability density function of correlation coefhcient 
for real data and for the one-factor model. It is worth not- 
ing that the one-factor model is able to explain more that 
80% of the correlation coefficients observed in real data. 
Therefore one could naively expect that also the correla- 
tion based MST of the one-factor model is quite similar 
to the correlation based MST of the real data. 

On the contrary the MST obtained with the one-factor 
model is very different from the one obtained from real 
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Fig. 7. Empirical probability density function of the correla- 
tion coefficients of a portfolio of 1071 stocks traded at NYSE 
in the 12-year period 1987-1998 (continuous line). The dashed 
line is the corresponding probability density function of a re- 
alization of the one-factor model with parameters fitted from 
real data. 




Fig. 8. MST of a numerical simulation of the one-factor 
model. The symbol code is the same as used in Fig. |S| 



data. In Fig.|Slwe show the MST obtained in a typical real- 
ization of the one-factor model performed with the control 
parameters obtained as described above. It is evident that 
the structure of sectors of Fig. [Sjis not present in Fig.|Hl 
In fact the MST of the one-factor model has a star-like 
structure with a central node. The largest fraction of node 
links directly to the central node and a smaller fraction is 
composed by the next-nearest neighbors. Very few nodes 
are found at a distance of three links from the central 
node. The central node corresponds to General Electric 
and the second most connected node is Coca Cola. It is 
worth noting that these two stocks are the two most highly 
connected nodes in the real MST also. The reason of the 
difference between the real and the one-factor model MST 
(despite the similarity in the distribution of the correlation 
coefficients) is attributable to the noise dressing. A great 
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fraction of the correlation coefficients is heavily dressed by 
noise due to the finiteness of the time series. The effect of 
dressing is similar in real and in surrogate time series be- 
cause the length of the time series has been chosen equal. 
On the other hand the method used to obtain the MST 
filters part of the relevant information of the correlation 
matrix, discarding the information more heavily dressed 
by the noise. The MST procedure therefore undress the 
correlation matrix, revealing the great differences between 
real and model data. We want to stress that the difference 
in the topology between MSTs can be made more quanti- 
tative. In Ref. [21] some of us conducted numerical simula- 
tions to show that some topological quantities (the degree 
and the in-degree distribution) of real and one-factor MST 
are different with 95% statistical confidence. 

In summary, the investigation of the topological prop- 
erties of correlation based networks is able to discriminate 
between real data and artificial data obtained with simple 
but widespread market models. 



7 Conclusions 

Correlation based networks can be obtained in financial 
markets by investigating a certain number of different fi- 
nancial time series. Here we have reviewed results obtained 
by us in different studies. Specifically, the discussed stud- 
ies have been concerning returns of stocks traded in a fi- 
nancial market at fixed or variable time horizon, volatility 
time series and index returns of stock exchanges located 
all over the world. The networks are obtained with a well- 
defined filtering procedure jHj, which mainly focuses on 
the most relevant correlations among stocks. Different fil- 
tering procedures have been proposed by different authors 
jl7lll8lll 9 , 20 and provide different aspects of the informa- 
tion stored in the investigated sets. The robustness over 
time of the MST characteristics has been investigated in a 
series of studies [!^:^7IIH8II89II24| . The filtering approach 
based on the MST can also be used to consider aspects 
of portfolio optimization |4()j and to perform a correlation 
based classification of relevant economic entities such as 
banks 0J and hedge funds 021 ■ 

The topology of the correlation based networks de- 
pends on the investigated set and on the details of in- 
vestigation (an example is the dependence observed for 
the time horizon used to compute the stock returns in the 
investigation discussed in Sect. 3). The observed topol- 
ogy ranges from the star-like one of the top-left panel of 
Fig. 121 to the complex multi-cluster structure of Fig. ^ 
Other networks have a relatively poor number of elements 
characterized by a high value of their degree. This last 
topology may be consistent with the topology observed 
in a correlation based network of a random financial mar- 
ket. On the other hand, the star-like topology is consistent 
with a dynamical model defined as a one-factor model. 

In summary, the study of correlation based financial 
networks is a fruitful method able to filter out economic 
information from the correlation coefficient matrix of a 
set of financial time series. The topology of the detected 



network can be used to validate or falsify simple, although 
widespread, market models. 
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